Mapping Allen institute scRNA-seq reference to Visium spatial (mouse brain)

Contents:

Loading packages

THEANO_FLAGS='force_device=True' forces the package to use GPU. Pay attention to error messages that might indicate theano failed to initalise the GPU.
Do not forget to change device=cuda0 to your available GPU id. Use device=cuda / device=cuda0 if you have just one locally or if you are requesting one GPU via HPC cluster job.
You should see a message similar to the one below confirming that theano started using the GPU:

Using cuDNN version 7605 on context None
Mapped name None to device cuda0: Tesla V100-SXM2-32GB (0000:89:00.0)

Loading Visium data

First let's read spatial Visium data from 10X Space Ranger output. Here we load sample annotations.

Next we load the mRNA count for each Visium slide and corresponding histology images as a list slides and as a single anndata object adata. We need this redundancy because scanpy plotting over the histology image does not work with multiple sections.

Now let's look at QC: total number of counts and total number of genes per spot

Loading Allen single cell reference data

Mouse data:

cd /nfs/team205/vk7/sanger_projects/large_data/brains/mouse_allen/ && wget https://brainmapportal-live-4cc80a57cd6e400d854-f7fdcae.divio-media.net/filer_public/49/6e/496ea0c2-25ae-4523-bb84-8d51a63ab72e/readme_mouse.txt
cd /nfs/team205/vk7/sanger_projects/large_data/brains/mouse_allen/ && wget https://transcriptomic-viewer-downloads.s3-us-west-2.amazonaws.com/mouse/sample-annotations.zip
cd /nfs/team205/vk7/sanger_projects/large_data/brains/mouse_allen/ && wget https://transcriptomic-viewer-downloads.s3-us-west-2.amazonaws.com/mouse/transcriptome.zip
cd /nfs/team205/vk7/sanger_projects/large_data/brains/mouse_allen/ && wget https://transcriptomic-viewer-downloads.s3-us-west-2.amazonaws.com/mouse/dendrogram.zip
cd /nfs/team205/vk7/sanger_projects/large_data/brains/mouse_allen/ && wget https://transcriptomic-viewer-downloads.s3-us-west-2.amazonaws.com/mouse/taxonomy.zip
def extract_sparse_matrix(h5f, data_path):   
    r""" Read Allen data h5 file format into sparse matrix
    :param h5f: "h5f" is the handle that you get from "h5py.File('mouse/transcrip.tome')"
    :param data_path: "data_path" is the path within the archive.  In this case probably 
    either "/data/exon/" or "/data/intron/".
    """
    data = h5f[data_path]
    x = data['x']
    i = data['i']
    p = data['p']
    dims = data['dims']   

    sparse_matrix = ss.csc_matrix((x[0:x.len()], 
                                   i[0:i.len()], 
                                   p[0:p.len()]), 
                                  shape = (dims[0], dims[1]))    
    return sparse_matrix

# Open the file and import exon and intron data as sparse matrices
h5f = h5py.File(sc_data_folder + 'transcrip.tome')
exons = extract_sparse_matrix(h5f, '/data/exon/')
introns = extract_sparse_matrix(h5f, '/data/intron/')

# combine introns and exons
counts = introns + exons

# read cell and gene names
from re import sub
gene_names = pd.Series([sub("b\'|\'", '', str(i)) for i in np.array(h5f['gene_names'])])
sample_names = pd.Series([sub("b\'|\'", '', str(i)) for i in np.array(h5f['sample_names'])])

# read cell annotations
sample_meta = pd.DataFrame(np.array([[sub("b\'|\'", '', str(i)) for i in np.array(h5f['sample_meta']['anno'][i])] 
                                       for i in np.array(h5f['sample_meta']['anno'])]).T,
                           columns=np.array(h5f['sample_meta']['anno']),
                           index=sample_names)

# extract tSNE coordinates of cells
tsne = pd.DataFrame(np.array(h5f['projection']['tsne']))
tsne.index = [sub("b\'|\'", '', str(i)) for i in tsne['sample_name']]
tsne = tsne.reindex(index=sample_meta.index)
tsne = tsne[['x', 'y']].values

# create anndata object
adata_snrna_raw = anndata.AnnData(counts,obs=sample_meta)
adata_snrna_raw.var_names = gene_names
adata_snrna_raw.obsm['tsne'] = tsne

# remove cells with no TSNE coordinates
adata_snrna_raw = adata_snrna_raw[np.isnan(adata_snrna_raw.obsm['tsne']).sum(1) == 0,:]

# save all cells as h5ad
adata_snrna_raw.write(sc_data_folder + 'all.h5ad')
adata_snrna_raw = anndata.read(sc_data_folder + 'all.h5ad')

# cells present in our sections
plus_v1 = ['AI', 'AUD', 'GU', 'HIP', 'RSP', 'SSp', 'SSs', 'SUB-ProS', 'TEa-PERI-ECT']
plus_v2 = ['AI', 'AUD', 'ENT1', 'GU', 'HIP', 'RSP', 'SSp', 'SSs', 'SUB-ProS', 'TEa-PERI-ECT', 'VIS']

# cells that should not be in our sections
minus = ['PAR-POST-PRE']

# filter to the region we need & save subsets
plus_v1_minus_ind = adata_snrna_raw.obs['region_label'].isin(plus_v1+minus)
plus_v1_ind = adata_snrna_raw.obs['region_label'].isin(plus_v1)
plus_v2_minus_ind = adata_snrna_raw.obs['region_label'].isin(plus_v2+minus)
plus_v2_ind = adata_snrna_raw.obs['region_label'].isin(plus_v2)
SSp_ind = adata_snrna_raw.obs['region_label'].isin(['SSp'])
HIP_ind = adata_snrna_raw.obs['region_label'].isin(['HIP'])

adata_snrna_raw[plus_v1_minus_ind,:].write(sc_data_folder + 'plus_v1_minus_visium.h5ad')
adata_snrna_raw[plus_v1_ind,:].write(sc_data_folder + 'plus_v1_visium.h5ad')
adata_snrna_raw[plus_v2_minus_ind,:].write(sc_data_folder + 'plus_v2_minus_visium.h5ad')
adata_snrna_raw[plus_v2_ind,:].write(sc_data_folder + 'plus_v2_visium.h5ad')
adata_snrna_raw[SSp_ind,:].write(sc_data_folder + 'SSp_visium.h5ad')
adata_snrna_raw[HIP_ind,:].write(sc_data_folder + 'HIP_visium.h5ad')

Select SSp cell types

Select cell types confidently found only in PAR-POST-PRE

Add counts matrix as adata.raw

cell2location analysis

Here we show how to perform the first step in one function run - train cell2location model to learn cell locations. Results are shown below and saved to: